PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb (302 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "BVp8vazXYOz-"
},
"source": [
"##### Copyright 2024 Google LLC."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "DBaXaQ_PYT4p"
},
"outputs": [],
"source": [
"# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XSkl5h3dZMo9"
},
"source": [
"# PaliGemma2 - Run with Transformers.js\n",
"\n",
"Author: Sitam Meur\n",
"\n",
"* GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n",
"* X: [@sitammeur](https://x.com/sitammeur)\n",
"\n",
"Description: This notebook demonstrates how you can run inference on PaliGemma2 model using Node.js and [Transformers.js](https://huggingface.co/docs/transformers.js/index). Transformers.js lets you run Hugging Face's transformer models directly in browser, offering a JavaScript API similar to Python's. It supports NLP, computer vision, audio, and multimodal tasks using ONNX Runtime and allows easy conversion of PyTorch, TensorFlow, and JAX models.\n",
"\n",
"<table align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ftkrrn3aZyAl"
},
"source": [
"## Setup\n",
"\n",
"### Select the Colab runtime\n",
"To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the PaliGemma 2 model. In this case, you can use CPU runtime:\n",
"\n",
"1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
"2. Select **Change runtime type**.\n",
"3. Under **Hardware accelerator**, select **CPU**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eCJ7yo3-Zzdj"
},
"source": [
"## Installation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ET_KH77YZ5lc"
},
"source": [
"Let's get started with installing the dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d7Ds4h2q0ItU"
},
"outputs": [],
"source": [
"# Install Node.js\n",
"!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -\n",
"!sudo apt-get install -y nodejs"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ObI8d_Rwa_nn"
},
"source": [
"## Create Node.js project\n",
"\n",
"Create a new Node.js project and install the required transformers package via [NPM](https://www.npmjs.com/package/@huggingface/transformers)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5nmPcg5J0cYj"
},
"outputs": [],
"source": [
"# Create project directory\n",
"!mkdir paligemma2-node\n",
"%cd paligemma2-node\n",
"\n",
"# Initialize NPM project\n",
"!npm init -y\n",
"!npm i @huggingface/transformers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gGYVoOXA3T7-"
},
"outputs": [],
"source": [
"%%writefile package.json\n",
"\n",
"{\n",
" \"name\": \"paligemma2-node\",\n",
" \"version\": \"1.0.0\",\n",
" \"main\": \"index.js\",\n",
" \"type\": \"module\",\n",
" \"scripts\": {\n",
" \"test\": \"echo \\\"Error: no test specified\\\" && exit 1\"\n",
" },\n",
" \"keywords\": [],\n",
" \"author\": \"\",\n",
" \"license\": \"ISC\",\n",
" \"description\": \"\",\n",
" \"dependencies\": {\n",
" \"@huggingface/transformers\": \"^3.1.2\"\n",
" }\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "U9mJ6Pi3bxCY"
},
"source": [
"## Transformers.js Inference\n",
"\n",
"Now, let's run inference on the PaliGemma2 model using Transformers.js. First, load the model and processor and then prepare inputs (Text query + Image) to run inference and get the output as desired image caption. For reference, you can check the model's page on the Hugging Face model hub under ONNX models section [here](https://huggingface.co/onnx-community/paligemma2-3b-pt-224)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8149e6d989e6"
},
"outputs": [],
"source": [
"# Show the image from the URL\n",
"from PIL import Image\n",
"import requests\n",
"\n",
"url = \"https://jethac.github.io/assets/juice.jpg\"\n",
"img = Image.open(requests.get(url, stream=True).raw) \n",
"img"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dbed89fc4a5f"
},
"source": [
"It's an image of a cat sitting on a bag, now let's see what the model predicts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-Gy2LdaY3Iqh"
},
"outputs": [],
"source": [
"%%writefile index.js\n",
"\n",
"// Import the required modules\n",
"import {\n",
" AutoProcessor,\n",
" PaliGemmaForConditionalGeneration,\n",
" load_image,\n",
"} from \"@huggingface/transformers\";\n",
"\n",
"// Load processor and model\n",
"const model_id = \"onnx-community/paligemma2-3b-pt-224\"; // Change this to use a different PaliGemma model\n",
"const processor = await AutoProcessor.from_pretrained(model_id);\n",
"const model = await PaliGemmaForConditionalGeneration.from_pretrained(\n",
" model_id,\n",
" {\n",
" dtype: {\n",
" embed_tokens: \"q8\", // or 'fp16'\n",
" vision_encoder: \"q8\", // or 'q4', 'fp16'\n",
" decoder_model_merged: \"q4\", // or 'q4f16'\n",
" },\n",
" }\n",
");\n",
"console.log(\"Model and processor loaded successfully!\");\n",
"\n",
"// Prepare inputs\n",
"const url = \"https://jethac.github.io/assets/juice.jpg\";\n",
"const raw_image = await load_image(url);\n",
"const prompt = \"<image>\"; // Caption, by default\n",
"const inputs = await processor(raw_image, prompt);\n",
"console.log(\"Inputs prepared successfully!\");\n",
"\n",
"try {\n",
" // Generate a response\n",
" const response = await model.generate({\n",
" ...inputs,\n",
" max_new_tokens: 100, // Maximum number of tokens to generate\n",
" });\n",
"\n",
" // Extract generated IDs from the response\n",
" const generatedIds = response.slice(null, [inputs.input_ids.dims[1], null]);\n",
"\n",
" // Decode the generated IDs to get the answer\n",
" const decodedAnswer = processor.batch_decode(generatedIds, {\n",
" skip_special_tokens: true,\n",
" });\n",
"\n",
" // Log the generated caption\n",
" console.log(\"Generated caption:\", decodedAnswer[0]);\n",
"} catch (error) {\n",
" console.error(\"Error generating response:\", error);\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Zz81XOKebf_j"
},
"source": [
"## Run Application"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PmVUVFDf3-yE"
},
"outputs": [],
"source": [
"# Run the node.js application\n",
"!node index.js"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gFgOMvLmcVnY"
},
"source": [
"## Conclusion\n",
"\n",
"Congratulations! You have successfully run inference on PaliGemma2 model using Transformers.js via Node.js environment. You can now integrate this into your projects."
]
}
],
"metadata": {
"colab": {
"name": "[PaliGemma_2]Using_with_Transformersjs.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}